Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource usage trackers and resource usage collector service #9890

Merged
merged 23 commits into from
Oct 16, 2023

Conversation

bharath-techie
Copy link
Contributor

@bharath-techie bharath-techie commented Sep 7, 2023

Description

  • This changes adds 'NodeResourceUsageTracker' which collects moving average of CPU, memory and IO of the local node ( IO changes coming later )
  • This also adds 'ResourceUsageCollectorService' which collects resource usage stats of the local node + downstream nodes in coordinating node. In this PR - we add 'local' node stats to this service.
  • The resource usage stats collected as part of ResourceUsageCollectorService will be consumed in AdmissionControlService which aids in rejection. It'll also be used in ranking. ---> These changes will come in later PRs.

Node stats output :

"resource_usage_stats": {
                "BaPltK4SSw6iO8S9oueYCg": {
                    "timestamp": 1697196310479,
                    "cpu_utilization_percent": "0.0",
                    "memory_utilization_percent": "31.0"
                }
            }

Related Issues

#8910

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Compatibility status:

Checks if related components are compatible with change bf7f65b

Incompatible components

Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Compatibility status:

Checks if related components are compatible with change dae8cb6

Incompatible components

Skipped components

Compatible components

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT:
  • URL:
  • CommitID: 0c010d9
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sachinpkale sachinpkale merged commit 368d35a into opensearch-project:main Oct 16, 2023
16 checks passed
@bharath-techie bharath-techie added the backport 2.x Backport to 2.x branch label Oct 18, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9890-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 368d35abb398b7e379db281358529e4cb689ae05
# Push it to GitHub
git push --set-upstream origin backport/backport-9890-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9890-to-2.x.

bharath-techie added a commit to bharath-techie/OpenSearch that referenced this pull request Oct 18, 2023
gbbafna pushed a commit that referenced this pull request Oct 18, 2023
…or service (#10695)

* Add resource usage trackers and resource usage collector service (#9890)

---------

Signed-off-by: Bharathwaj G <[email protected]>

* 2.x specific changes

Signed-off-by: Bharathwaj G <[email protected]>

---------

Signed-off-by: Bharathwaj G <[email protected]>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 19, 2023
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
@rramachand21 rramachand21 added the v2.12.0 Issues and PRs related to version 2.12.0 label Oct 31, 2023
@Jakob3xD
Copy link

Jakob3xD commented Nov 9, 2023

@bharath-techie I came across this MR during my opensearch-go refactoring and noticed that this PR add the field resource_usage_stats to the /_nodes/stats api. However we now have the Node ID at the beginning of the struct and again inside the struct, which I find confusing and an unneeded duplication. Beside are you exposing the percent as string and not as number. So what I mean following:

{
  "nodes": {
    "<the-node-id>": {
      "resource_usage_stats":{
        "<the-node-id>":{
          "timestamp": 0,
          "cpu_utilization_percent": "5.5",
          "memory_utilization_percent": "17.2"
        }
      }
    }
  }
}

I am not familiar with java and the goal/proposal this is referring to. But I have question about this. Isn't it possible to use the existing stats of os, jvm or process from the _nodes/stats api?

@bharath-techie
Copy link
Contributor Author

@bharath-techie I came across this MR during my opensearch-go refactoring and noticed that this PR add the field resource_usage_stats to the /_nodes/stats api. However we now have the Node ID at the beginning of the struct and again inside the struct, which I find confusing and an unneeded duplication. Beside are you exposing the percent as string and not as number. So what I mean following:

{
  "nodes": {
    "<the-node-id>": {
      "resource_usage_stats":{
        "<the-node-id>":{
          "timestamp": 0,
          "cpu_utilization_percent": "5.5",
          "memory_utilization_percent": "17.2"
        }
      }
    }
  }
}

I am not familiar with java and the goal/proposal this is referring to. But I have question about this. Isn't it possible to use the existing stats of os, jvm or process from the _nodes/stats api?

Hi @Jakob3xD ,
This is the initial implementation where the stats are carrying only the local node stats stored in 'ResourceUsageCollectorService'. In future PRs, we will implement ways to store downstream nodes' data in coordinator node in 'ResourceUsageCollectorService' which will be exposed in the same stats ( so it will be list of nodes and its resource usage stats )

austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Dec 13, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants